April 1, 2016

Programming without Text

I am disappointed with the current state of programming today, largely because it used to be one of my favourite pasttimes. It’s not the logic, or the projects, or the prevailing attitudes, or even the languages employed, per se, which are the source of my disappointment.

Rather, it is the actual experience of programming which I feel falls short: so very short of what I feel it really could and ought to be in this day and age.

Essentially, we have artificially restricted ourselves to expressing program structure in the form of text documents. This is the key problem.

Using text as an expression mechanism is holding us back more than most engineers realize. Certainly, there are a host of benefits: it’s extraordinarily basic to deal with, highly interoperable, cut-paste works without a hitch, and every single programming language is designed to be edited in text.

But we all realize there are drawbacks - huge and serious drawbacks to editing in text. Most prominently, that it’s just incredibly tedious - especially for complex code bases, new libraries, or unfamiliar languages. And even then, there will always be a ‘gotcha’ of some hidden bug caused by one (sometimes invisible) piece of punctuation just slightly out of place.

Many people, of course, have gone to great lengths to improve the experience of programming in text. Eclipse, Visual Studio, even emacs provide considerable tools to hint at what the programmer might mean, or wishes to do. There are tools for integrated debugging, attempts to cross-reference documentation, and occasionally, interpret the language, or the contents of comments, or provide type hints.

Some of the most modern approaches to fixing the editor problem are represented by Light Table, which allows you to edit text in several very small windows that you can move around. While this is sort of nifty, it falls short of my vision in several significant ways which I’ll describe shortly.

On Abstract Syntax Trees #

So what is the alternative to editing text? Every clever meta-engineer jumps to the same conclusion: write a structure editor where you can edit the Abstract Syntax Tree directly!

What’s an Abstract Syntax Tree (AST)? It’s the result of the step your compiler (or interpreter) takes immediately before dealing with any of the logic or meaning of your program. It’s the output of something called the lexer and parser of the language. Essentially, the abstract syntax tree holds a representation of the program in memory in a sort of meaningful-but-not-quite sense. None of the keywords or built-in functions will have been recognized yet, but the AST might contain a description along the lines of: There is a symbol, followed by a code block, containing another symbol, and a parenthesis set containing one argument which is a constant string.

In practice, I think ASTs probably vary a little bit in the degree of semantic interpretation that contain in them, but in general ASTs wouldn’t recognize an if statement as such, but rather just contain the understanding that there is a symbol described by the characters i and f.

The interpreter or compiler then operates directly on the AST to apply meaning to the program an execute it.

So, an AST does indeed look like a natural candidate for doing some structured editing! It contains everything the engineer has authored in code, but has done alway with most of the ridiculous gobbledygook of brackets and line returns and whitespace, and sort of put everything into the right buckets.

Sounds great! Let’s do it.

It turns out there actually have been several attempts to provide AST editors. Paredit in emacs comes somewhat close. I believe JetBrains has a product that implements a projectional editor which can sort of do this if it has the right description of the language.

Now, none of these efforts have really taken off in any kind of big way, and I think I know why: ROI. Return on investment. The difference between editing in text and editing in a structured editor could be a bit frustrating: the structured editor is going to stop you from doing certain things; it’s going to perhaps make copy-paste a bit strange, and at the end of the day, it’s really only solved a single problem we had before: syntax - and only partially!

At the level of an AST, a program has no semantic meaning. So a structured editor allowing you to construct an AST doesn’t have enough information to give you any real program structure to edit around, but instead will just try to make sure that the bits of the AST you’re putting together are vaguely allowable in the widest interpretation of the language. So, certainly, it’s going to prevent the problem of missing semicolons (because they would no longer be necessary), and it’s going to eliminate the unbalanced parenthesis problem (because the entire expression is one editable block), but it’s impossible, by definition, for an AST editor to make sure that any of your editable blocks make any kind of sense at all.

So, I think AST editors have a bad ROI: you have to invest in a new product, and a new way of thinking about editing your code, and all you get in response is balanced parenthesis and no need to type semicolons. Worse, this sort of thing is simple enough that even the dumbest text editors have extensions that can do this kind of thing for you if it actually ends up being such a painful part of your life.

At the end of the day, it’s not that AST editors have no value, it’s just relatively minimal compared to the kinds of issues programmers face and the difficulty of taking them away from their currently favoured setup.

So what happens if we go a step past the AST? What if we want to edit and operate on something like an actual semantic representation of the program? Well, as interpreter and compiler-writers will tell you, this doesn’t necessarily exist. At all. While the AST is easily-identifiable in nearly every language, there no particularly agreed-upon way of speaking about anything really past that. In many cases, there just isn’t another interpretation: things just get executed directly.

On Smalltalks #

Allow me to describe what I think this next-level semantic representation can look like. I think it actually looks a lot (possibly identical) to a running program in memory. I want to interrogate objects that know they are classes, and ask about their constituents. I want to be able to interrogate functions and ask about their parameters. To this degree, we have only one well-known parallel, which is Smalltalk. Smalltalk is explicitly designed to spin-up objects for editing all the time, and allow the programmer to modify them directly. Many other languages have the idea of runtime reflection, but vanishingly few provide the ability for a programmer to interactively edit anything. So Smalltalk is an almost unique example of being able to directly edit objects and functions as real and actual things in memory.

A word of caution here: Image-based languages are an interesting and powerful idea which allow the entire state of the program at any time to be serialized to disk and restarted at a later time much as a virtual machine. Smalltalk is one of these, and there are several LISPs which do this as well. However, the bit about Smalltalk I’m extolling here doesn’t, I don’t think, necessarily requite an entire Image-based language approach, since I’m not (yet) discussing the serialization of objects in real-time, on-the-fly, during program execution. I’m specifically referring to serializing the placement of all the dominoes before tipping one over: not serializing it all mid-flight.

So, Smalltalk is great. But allow be to identify one place where I think it falls quite short. Once you open an object, and you see all of its methods, and you decide to edit a function, guess what Smalltalk does?

It opens a text editor! All the object infrastructure in the world, super reflection capabilities, and a full-blowed image-based VM approach, and it opens a text editor to edit functions! Perhaps we shouldn’t come down on Smalltalk too hard: it first appeared in 1972, and it still goes further than any other programming environment devised in terms of the experience in editing programs.

What’s needed then, is a program-editing experience much like Smalltalk, but make up for their little shortcoming of having to edit functions using text. But why? What can we ever hope to achieve other than semicolon fixes and parenthesis balancing?

On Naming Things #

There’s an old expression in the computer science community: There are only two hard things in Computer Science: cache invalidation and naming things. by someone named Phil Karlton, who I can’t quite determine the background of just now.

To begin our exploration of things that would not just be better, but revolutionized by ending our reliance on text, there are few places better to start than naming.

When dealing with any other kind of computer system, or database, we don’t place a lot of reliance on text. If there’s a user named "Bob", and we want to credit his account, we don’t store something like ("Bob", +1). We don’t do anything like that at all. In fact, we beat the snot of out first years who try to do anything even remotely close to this. The way this is accomplished in practice, is we allow the user to lookup "Bob", and disambiguate which Bob they want to perform this important operation on! It might come back that there are fifteen million different Bob’s in the system! So, we provide a beautiful interface for them to find the right Bob, and filter by age, postal code, e-mail and so-on.

Eventually we find the right bob. Do we store ("Bob McKay", +1) in the database? Absolutely not. We find the unique identifier referring to that Bob, and we link everything to that unique identifier. But, how complicated! How indirect! "Bob McKay" might be unique! Certainly, it might be now, but what if there’s another "Bob McKay" in the future? What if - what if - Bob changes his name?

This rather tortured tour of database design has several direct parallels to programming proper. Names (“symbols”) in programming are a complete disaster due to the sole and exclusive use of text as an editing mechanism. There’s an entire art, science and cottage industry of tools for code “refactoring”, one major use of which is to allow Bob to change his name (change the name of a symbol).

Due to ancient rules in every lexer, names in programming are required to be very silly. Most special characters are disallowed, and spaces are universally verboten. This ends up with, at best, names like numOfToysInTheBox, or helpfully, x, or counter, or idx.

But apart from requiring silly names and requiring you to run a sed script to change them across all your text files, there are considerably more serious problems with naming:

for (var i = 0; i < 10; i++) {
    doThing(i);
}
for (var i = 0; i < jokes.length; i++) {
    print(jokes[i]);
}

What if in the above case, you needed to move the first loop inside the second? What if the programmer was too overworked to notice they were employing the same name for a loop variable? This mistake is easy to make if the outer loop is obscured by a wall of code.

I’ll tell you what happens: half an hour of downtime on a production system because the system going into an infinite loop for no discernible reason. Even our outrageously aggressive code linter didn’t pick up on this one, and after it happened three times in one week to different programmers, we ended up writing code to augment the linter to catch this exact situation.

All because text. Semantically, you have two separate loops with their own counters: the name of the loop variable is completely immaterial. Clearly, each loop was intending to employ their own loop counter when they were written. Why should moving a block of code suddenly change the allegiance of a variable?

What if… References to variables were actually linked by identifiers, instead of by names?

Allow me to identify another incredibly difficult case: method names. Two different interfaces might have the name print, but one refers to the command line, and other means send the object to the printer. At this point, implementing both interfaces would have the class believe that both were satisfied, but instead there’s an enormous latent bug in the program.

This also serves as the core of the Diamond Problem when attempting to perform multiple inheritance: names! What if there’s a name collision?

Some languages have now implemented the “feature” that protocols (or interfaces) have their method names automatically namespaced. Excellent! This development took forty years. I propose we entirely do away with namespacing, and allow symbols to be first-class data objects, with a property of having a name — for display purposes only. They could have a colour, too! They can display themselves however their little hearts desire, as long as at the end of the day the interpreter is able to discern which unique identifier they are referring to.

But would symbols just be global chaos? Sure! If you wished to code that way. But you could also impose whatever hierarchy or lookup structure on symbols you desired.

There’s yet more trouble in this vein, and it touches on naming, but goes beyond into more text-problematic coding. The Expression Problem was coined by a clever fellow at Bell Labs in 1998 and goes as follows: if you want to add a new function to a bunch of things (say, jump!), you have to go find all of those things and add a jump! function to them. If you want to add a new type of object, you have to go figure out what all the functions are it’s supposed to handle and perhaps edit all of them to support the new type. The problem is severely compounded if for some reason you’re not actually permitted to edit any existing objects.

I’m not really onboard the ‘not permitted to edit’ issue (which, I think, is mostly a restriction to permit name collisions anyway!), but the greater issue here is that there’s a two-dimensional matrix of function vs type where we’d like to have logic for each combination. The Object people want to solve it by having objects with a list of functions, and the function people want to solve it by having functions with a list of types in side them.

From my position, the entire thing looks completely asinine and the sort of thing that we wouldn’t have any need to waste the time of Bell Labs fellows on if we weren’t representing everything as text. I see no reason why you couldn’t define the behaviour, link it to both, and have it correctly show up in a program view no matter whether you choose an object-centric or function-centric perspective: the only differences are how you would represent the code, not what’s actually going to happen in program execution.

On Arity #

As a sort of violent aside, I believe we should dispense with the idea of function “arity” completely. Every language I’ve ever used seems to treat arity as something sacrosanct: when you call a function there shalt be a “first argument” and a “second argument” etc. I would rather just pass “the arguments” as one big blob, and have the function sort it out if necessary. My gut feeling is that there are too many cases where an order is imposed but does not have semantic meaning. add(x, y), is generally a commutative operation. It should be order-less. Similarly, print(stream, string) doesn’t really have order, per se, it just has two separate things which need to be provided. I think it would be considerably cleaner for everyone involved to simply pass an object which is capable of providing both a stream and a string in the above case. I don’t have a wealth of examples here, but I only mention this “arity” issue as essentially a matter of taste: I feel we are expressing something in code which we have no intention of expressing, and it can occasionally lead to bugs or confusion.

I expect a lot of upsetting communicae from functional programming folks in this regard. The usual tools of map, reduce, foldr, etc might look kludgy in this scenario - but I think they could be made to work quite simply as well.

On Dynamicism #

Certain astute engineers will now be expressing considerable skepticism around the idea of sort of ‘hard-wiring’ symbols and to what degree any semantic interpretation is really possible at edit time because… Dynamics!

Many interesting modern non-static languages come equipped with significant capabilities that allows the dynamic creation of functions and classes and so-on in real-time. Many go further with comprehensive reflection and built-in interpreters, etc. These languages include Javascript, Python, Ruby and countless more.

The main issue is that if objects are free to interpret messages (or attribute lookups, etc) dynamically, there is essentially very little semantic guidance that any structured editor can provide a programmer at edit-time. This problem is fundamental, and also affects the design of compilers etc., since behaviour remains essentially a mystery until the time of function invocation.

I have a proposed solution for this problem which is not possible to achieve in a text-based programming environment:

Require that the message itself - the message being sent to the object to invoke a function - is a first-class object, and actually instantiated at edit-time. (There will be maximally-dynamic times when this is not possible, but I won’t concern us with those here). What this achieves is that while the destination of the message remains hugely dynamic, the message itself is not only known, but actually a real instance at edit time. This means, among other things, that all of its parameters are known, its documentation is easily displayed, and, in the most extreme (and fantastic!) version of the world, we can allow function invocations to display their own user interface to the programmer!

I think this might actually be quite revolutionary. Imagine invoking an Amazon Web Services connection with a beautiful dialog box instead of a thousand pages of documentation. Or imagine invoking an HTTP server without having to figure out the incredible tedium that the HTTP server du jour requires you to pass it? It could prompt you!

This isn’t to say that the inputs to these user interfaces are necessarily static. I see no reason why expressions couldn’t be passed in just like any other function invocation; it could simply be accomplished in a more humane manner.

I’m extraordinarily excited at this prospect because I think it, more than perhaps some of the other syntactic cleanup that’s achieved by this approach, holds the promise of eliminating a major gumption trap in the engineer’s life:

I’ve been trying to establish this godforsaken connection for half an hour and I’ve tried every variation of what the documentation says, but it doesn’t work!

In today’s API-saturated programming life, it’s difficult for anyone in the industry to rely upon their familiar set of functions and libraries: there will always be new technologies that need to be adopted, and new services that need to be invoked.

Be elevating message calling to first-class objects and - potentially - providing those invocations with their own interface, this enormous source of discouragement can be eliminated from an engineer’s day.

Conclusion #

There are a host of benefits to adopting a non-text representation of code which goes beyond an Abstract Syntax Tree. Several key problems which have plagued programming for more than four decades are resolved by this important change of perspective, including the Diamond Problem and the Expression Problem.

There is also the promise to change the way function invocation works by enriching function calls with custom interfaces and first-class messages.

Given the relatively radical agenda outlined here, there are many topics that haven’t been discussed at all, and are worthy of much future thought: source control, serialization, language-basis, compilation and so-forth.

Whatever the challenges may prove to be, the potential herein is worth a vigorous exploration.